UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks Against Split Learning. (arXiv:2108.09033v2 [cs.CR] UPDATED)
Training deep neural networks often forces users to work in a distributed or
outsourced setting, accompanied with privacy concerns. Split learning aims to
address this concern by distributing the model among a client and a server. The
scheme supposedly provides privacy, since the server cannot see the clients'
models and inputs. We show that this is not true via two novel attacks. (1) We
show that an honest-but-curious split learning server, equipped only with the
knowledge of the client neural network architecture, can recover the input
samples and obtain a functionally similar model to the client model, without
being detected. (2) We show that if the client keeps hidden only the output
layer of the model to "protect" the private labels, the honest-but-curious
server can infer the labels with perfect accuracy. We test our attacks using
various benchmark datasets and against proposed privacy-enhancing extensions to
split learning. Our results show that plaintext split learning can pose serious
risks, ranging from data (input) privacy to intellectual property (model
parameters), and provide no more than a false sense of security.